How Hard is Reading Philosophy?: A Brief Analysis of the Readability of 59 Philosophy Texts

screenshot1.png

1. Introduction

From the screenshot above, we can see that many people think philosophy is a difficult subject. Furthermore, according to a blog post from the Blog of APA, "Of all the major disciplines, philosophy is the least likely to be taught in American primary or secondary schools, either as a core subject required for graduation or as an optional elective for interested and engaged students." In contrast, in France where I studied before, philosophy is a required subject for all high school students. However, many of my French friends have complained about how difficulty is it for them as high school students to understand their philosophy classes.

In this project, I want to explore the question: how hard is philosophy? Specifically, what is the difficulty to read philosophy texts. While readability is not the only element determining the complexity of learning a subject, it is an important factor impact factors such as what level of education is needed in order to study the subject and whether ordinary people would be interested in learning the subject.

In additional providing information on the question, another objective of this analysis is to provide information for those who are scared of reading philosophy texts because of the public's perception on the difficulty of reading philosophy. I want use this analysis to provide some insights on what are the texts that are potentially easier to read for beginners in philosophy.

I will be using the History of Philosophy dataset (for the Philosophy Data Project). According to Kourosh Alizadeh, the contributor of the dataset, "the dataset contains over 300,000 sentences from over 50 texts spanning 10 major schools of philosophy. The represented schools are: Plato, Aristotle, Rationalism, Empiricism, German Idealism, Communism, Capitalism, Phenomenology, Continental Philosophy, and Analytic Philosophy."

2. Data Wrangling

3. What are the readabilities of philosophy texts?

3.1 Defining readability

According to a Wikipedia page about readability, "Readability is the ease with which a reader can understand a written text. In natural language, the readability of text depends on its content (the complexity of its vocabulary and syntax) and its presentation (such as typographic aspects that affect legibility, like font size, line height, character spacing, and line length)." The Flesch reading-ease test is one of the most popular measures of readability. In this project, I will mainly use the score from the Flesch reading-ease test as the reference (a lower score means higher difficult). At the same time, a modified version, Flesch–Kincaid grade level will also be calculated.

\begin{equation}Flesch\:reading\:ease\:score = 206.835 - 1.015\times \frac{total\:words}{total\:sentences}-84.6\times \frac{total\:syllables}{total\:words} \end{equation}\begin{equation}Flesch-Kincaid\:grade\:level = 0.39\times \frac{total\:words}{total\:sentences} + 11.8\times \frac{total\:syllables}{total\:words} - 15.59 \end{equation}

In order to calculate the scores, we need to calculate 3 components:

While the total number of sentences can be calculated by counting the number of rows containing that specific title and the total number of words can be calculated by tokenizing the words and sum up the number of tokenized words, the total number of syllables in a text is much harder to be calculated. I will use the method introduced in the post. The method can be summarized by counting the "stress marker" in a word to estimate the number of syllables in a word wit the help of pronouncing dictionary(cmudict). While the estimation is generally accurate with the method, the main drawback is that the some words are not contained in the cmudict. In that case, I will use the syllables package to estimate the number of syllables.

3.2 Calculating readability scores

Now I will use the following formulas to calculate the readability scores \begin{equation}Flesch\:reading\:ease\:score = 206.835 - 1.015\times \frac{total\:words}{total\:sentences}-84.6\times \frac{total\:syllables}{total\:words} \end{equation}

\begin{equation}Flesch-Kincaid\:grade\:level = 0.39\times \frac{total\:words}{total\:sentences} + 11.8\times \frac{total\:syllables}{total\:words} - 15.59 \end{equation}

We can see that the range of readability is actually very wide for the 59 texts. The readability score ranges from 75.4 to 13.0, meaning that while some texts are extremely difficult to understand, some are actually relatively easy. For people who want to start reading philosophy, you can choose those with higher readability scores to begin with! The specific intepretation of the score will be shown below.

3.3 Intepretating readability scores

The interpretation of the Flesch reading ease score: image.png

From the graph, we can see that in general, the philosophy texts are hard to read (61% of them are college level readings). This level of difficulty to read partially explains why American high schools rarely offer philosophy classes and why French high school students find understanding class contents difficult. However, surprisingly, only 2 texts are at college graduate level and non in the professional level! This means that while reading the philosophy texts shouldn't be difficult for many of us (college students), fully understanding them might be the real reason behind people's perception of philosophy being a difficult subject. The 5 most difficult texts based on the readability test are:

Furthermore, many of the texts have the readability levels of high school students. While in reality, they might be a bit harder to read due to the abstract nature of contents, you can probably choose those to start if you want to learn more about philosophy. The 5 easist texts based on the readability test are:

3.4 The hardest phrases from a selection of texts

Even though we have the Flesch reading ease score for each text, we can use wordcloud to represent some of the hardest phrases in each text to give a more direct representation of the sentence complexity from each work.

I will use the ratio of number of syllables to number of words in each sentence as the indicator of each sentence's complexity.

I want to show what are the hardest sentences in each text look like to give the audience a better impression of level of difficulties in each text. However, due to the limitation of space, I will choose every 4th text (based on the ranking of the Flesch reading ease score) as the source of the wordcloud.

For those who want to read any of these texts, you can see if you can understand those hardest sentences before you make the decision to read the book or not.

A brief skim-through of all the word clouds show that indeed, those with higher reading ease score generally have less complex sentence structures and and their choices of words are also easier to understand.

4. Readability by categories

Seeing what the readability of each text is, someone might ask: if the text I want to read is not on among those 59, how do I know whether they are difficult to read? To answer the question, I will plot readability against original publication dates, school of thoughts, and authors to show the relationship between readability and these different categories.

4.1 Readability over time

I want to see whether the original publication date will be a good indication of the readability of the text, and my guess would be that the readability of the book would change over time because of the change of writing styles over time. I would plot the readability score against the original publication date to see whether my guess is true.

In this case, we can already see that unfortunately the original publication date is probably not a good indicator of a work's readability. However, I still want to focus on those works published after 1600 to see whether a pattern exist, since the majority of the works in our lists are published after 1600.

Again, even after limiting the texts to those originally published after 1600, we still don't see a clear pattern between the readability score and the original publication year. After fitting a linear regression, we see that ${R}^2$ is only 2.44e-02.

4.2 Readability of texts from different schools

We can see that texts from Plato, Stoicism, Aristotle, Nietzsche, and Analystic schools of thought have relative higher readability while texts from Continental, Communism, German Idealism are among the hardest to read.

4.3 Readability of texts by different authors

We can see that texts by Wittgenstein, Plato, Epictetus and Marcus Aurelius have relatively higher readability, while the texts by Descartes, Lenin, Husserl, Kant and many others are much harder to read

5. Conclusion

As the analysis demonstrates, the majority (64.3%) of the texts require at least college level education to read based on the Flesch reading ease score. The average score of 46.0 points corresponds to college level education (in comparison, Time magazine scores 52 points). I will conclude that in general, philosphy texts are difficult to understand.

However, not all the philosophy texts are difficult to read based on the texts' Flesch reading ease score. The collection of 59 texts we have in the dataset has a wide spread of reading ease score from 75.4 (7th grade) to 13.0 (College graduate). So people with less experience reading philosophy texts (even high school students) can start with those that are easier to understand, and theoretically they will be able to read some of those texts.

5.1 Limitations

This dataset only contains 59 texts and some of the sentences are not seperated properly so that can impact the result. In addition, a lot of words with long syllables are names and in philosophy texts those names tends to repeat multiple times. However, due to the quantity of data available, the result from this analysis will be relatively accurate.

Another limitation is that because many of the texts are translated version so the readability also depends on the particular version's translator's style. Some might potentially find versions translated with easier to understand choice of words and styles.

The biggest limitation of this analysis is the disctinction between understanding and reading. Alghouth the analysis provides some indications on these texts' reading ease score based on a particular formula, the reading ease score does not directly translate into the level of difficulty in fully understanding the text, especially consider the abstract nature of philosophy texts. However, if you are scared of reading philosophy texts, don't be! You can use the analysis' findings as an indicator when choosing your text to start! You will only know if you can understand the text if you actually start reading them yourself!